Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 66
Filtrar
1.
Circ Res ; 2024 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-38639096

RESUMO

BACKGROUND: While our understanding of the single-cell gene expression patterns underlying the transformation of vascular cell types during the progression of atherosclerosis is rapidly improving, the clinical and pathophysiological relevance of these changes remains poorly understood. METHODS: Single-cell RNA sequencing data generated with SmartSeq2 (≈8000 genes/cell) in nearly 19 000 single cells isolated during atherosclerosis progression in Ldlr-/-Apob100/100 mice with human-like plasma lipoproteins and from humans with asymptomatic and symptomatic carotid plaques was clustered into multiple subtypes. For clinical and pathophysiological context, the advanced-stage and symptomatic subtype clusters were integrated with 135 tissue-specific (atherosclerotic aortic wall, mammary artery, liver, skeletal muscle, and visceral and subcutaneous, fat) gene-regulatory networks (GRNs) inferred from 600 coronary artery disease patients in the STARNET (Stockholm-Tartu Atherosclerosis Reverse Network Engineering Task) study. RESULTS: Advanced stages of atherosclerosis progression and symptomatic carotid plaques were largely characterized by 3 smooth muscle cells (SMCs), and 3 macrophage subtype clusters with extracellular matrix organization/osteogenic (SMC), and M1-type proinflammatory/Trem2-high lipid-associated (macrophage) phenotypes. Integrative analysis of these 6 clusters with STARNET revealed significant enrichments of 3 arterial wall GRNs: GRN33 (macrophage), GRN39 (SMC), and GRN122 (macrophage) with major contributions to coronary artery disease heritability and strong associations with clinical scores of coronary atherosclerosis severity (SYNTAX/Duke scores). The presence and pathophysiological relevance of GRN39 were verified in 5 independent RNAseq data sets obtained from the human coronary and aortic artery, and primary SMCs and by targeting its top-key drivers, FRZB and ALCAM, in cultured human vascular SMCs. CONCLUSIONS: By identifying and integrating the most gene-rich single-cell subclusters of atherosclerosis to date with a coronary artery disease framework of GRNs, GRN39 was identified and independently validated as being critical for the transformation of contractile SMCs into an osteogenic phenotype promoting advanced-stage, symptomatic atherosclerosis.

2.
NPJ Syst Biol Appl ; 10(1): 24, 2024 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-38448436

RESUMO

Genome-scale metabolic models are powerful tools for understanding cellular physiology. Flux balance analysis (FBA), in particular, is an optimization-based approach widely employed for predicting metabolic phenotypes. In model microbes such as Escherichia coli, FBA has been successful at predicting essential genes, i.e. those genes that impair survival when deleted. A central assumption in this approach is that both wild type and deletion strains optimize the same fitness objective. Although the optimality assumption may hold for the wild type metabolic network, deletion strains are not subject to the same evolutionary pressures and knock-out mutants may steer their metabolism to meet other objectives for survival. Here, we present FlowGAT, a hybrid FBA-machine learning strategy for predicting essentiality directly from wild type metabolic phenotypes. The approach is based on graph-structured representation of metabolic fluxes predicted by FBA, where nodes correspond to enzymatic reactions and edges quantify the propagation of metabolite mass flow between a reaction and its neighbours. We integrate this information into a graph neural network that can be trained on knock-out fitness assay data. Comparisons across different model architectures reveal that FlowGAT predictions for E. coli are close to those of FBA for several growth conditions. This suggests that essentiality of enzymatic genes can be predicted by exploiting the inherent network structure of metabolism. Our approach demonstrates the benefits of combining the mechanistic insights afforded by genome-scale models with the ability of deep learning to infer patterns from complex datasets.


Assuntos
Escherichia coli , Aprendizado de Máquina , Escherichia coli/genética , Redes Neurais de Computação , Fenótipo
3.
ArXiv ; 2024 Jan 11.
Artigo em Inglês | MEDLINE | ID: mdl-38259344

RESUMO

Multivariate Mendelian randomization (MVMR) is a statistical technique that uses sets of genetic instruments to estimate the direct causal effects of multiple exposures on an outcome of interest. At genomic loci with pleiotropic gene regulatory effects, that is, loci where the same genetic variants are associated to multiple nearby genes, MVMR can potentially be used to predict candidate causal genes. However, consensus in the field dictates that the genetic instruments in MVMR must be independent (not in linkage disequilibrium), which is usually not possible when considering a group of candidate genes from the same locus. Here we used causal inference theory to show that MVMR with correlated instruments satisfies the instrumental set condition. This is a classical result by Brito and Pearl (2002) for structural equation models that guarantees the identifiability of individual causal effects in situations where multiple exposures collectively, but not individually, separate a set of instrumental variables from an outcome variable. Extensive simulations confirmed the validity and usefulness of these theoretical results even at modest sample sizes (n≳500 -1000). Importantly, the causal effect estimates remain unbiased and their variance small when instruments are highly correlated. We applied MVMR with correlated instrumental variable sets at genome-wide significant loci for coronary artery disease (CAD) risk using expression Quantitative Trait Loci (eQTL) data from seven vascular and metabolic tissues in the STARNET study. Our method predicts causal genes at twelve loci, each associated with multiple colocated genes in multiple tissues. We confirm causal roles for PHACTR1 and ADAMTS7 in arterial tissues, among others. However, the extensive degree of regulatory pleiotropy across tissues and the limited number of causal variants in each locus still require that MVMR is run on a tissue-by-tissue basis, and testing all gene-tissue pairs with cis-eQTL associations at a given locus in a single model to predict causal gene-tissue combinations remains infeasible. Our results show that within tissues, MVMR with dependent, as opposed to independent, sets of instrumental variables significantly expands the scope for predicting causal genes in disease risk loci with pleiotropic regulatory effects. However, considering risk loci with regulatory pleiotropy that also spans across tissues remains an unsolved problem.

4.
Pharmacol Ther ; 250: 108530, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37708996

RESUMO

Neurodevelopmental disorders (NDDs) impact multiple aspects of an individual's functioning, including social interactions, communication, and behaviors. The underlying biological mechanisms of NDDs are not yet fully understood, and pharmacological treatments have been limited in their effectiveness, in part due to the complex nature of these disorders and the heterogeneity of symptoms across individuals. Identifying genetic loci associated with NDDs can help in understanding biological mechanisms and potentially lead to the development of new treatments. However, the polygenic nature of these complex disorders has made identifying new treatment targets from genome-wide association studies (GWAS) challenging. Recent advances in the fields of big data and high-throughput tools have provided radically new insights into the underlying biological mechanism of NDDs. This paper reviews various big data approaches, including classical and more recent techniques like deep learning, which can identify potential treatment targets from GWAS and other omics data, with a particular emphasis on NDDs. We also emphasize the increasing importance of explainable and causal machine learning (ML) methods that can aid in identifying genes, molecular pathways, and more complex biological processes that may be future targets of intervention in these disorders. We conclude that these new developments in genetics and ML hold promise for advancing our understanding of NDDs and identifying novel treatment targets.


Assuntos
Estudo de Associação Genômica Ampla , Transtornos do Neurodesenvolvimento , Humanos , Big Data , Transtornos do Neurodesenvolvimento/tratamento farmacológico , Transtornos do Neurodesenvolvimento/genética , Algoritmos , Aprendizado de Máquina
5.
Front Endocrinol (Lausanne) ; 14: 1186252, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37745713

RESUMO

Genome-wide association meta-analysis (GWAMA) by the Cortisol Network (CORNET) consortium identified genetic variants spanning the SERPINA6/SERPINA1 locus on chromosome 14 associated with morning plasma cortisol, cardiovascular disease (CVD), and SERPINA6 mRNA expression encoding corticosteroid-binding globulin (CBG) in the liver. These and other findings indicate that higher plasma cortisol levels are causally associated with CVD; however, the mechanisms by which variations in CBG lead to CVD are undetermined. Using genomic and transcriptomic data from The Stockholm Tartu Atherosclerosis Reverse Networks Engineering Task (STARNET) study, we identified plasma cortisol-linked single-nucleotide polymorphisms (SNPs) that are trans-associated with genes from seven different vascular and metabolic tissues, finding the highest representation of trans-genes in the liver, subcutaneous fat, and visceral abdominal fat, [false discovery rate (FDR) = 15%]. We identified a subset of cortisol-associated trans-genes that are putatively regulated by the glucocorticoid receptor (GR), the primary transcription factor activated by cortisol. Using causal inference, we identified GR-regulated trans-genes that are responsible for the regulation of tissue-specific gene networks. Cis-expression Quantitative Trait Loci (eQTLs) were used as genetic instruments for identification of pairwise causal relationships from which gene networks could be reconstructed. Gene networks were identified in the liver, subcutaneous fat, and visceral abdominal fat, including a high confidence gene network specific to subcutaneous adipose (FDR = 10%) under the regulation of the interferon regulatory transcription factor, IRF2. These data identify a plausible pathway through which variation in the liver CBG production perturbs cortisol-regulated gene networks in peripheral tissues and thereby promote CVD.


Assuntos
Doenças Cardiovasculares , Glucocorticoides , Transcortina , Humanos , Tecido Adiposo , Doenças Cardiovasculares/genética , Redes Reguladoras de Genes , Estudo de Associação Genômica Ampla , Fatores de Risco de Doenças Cardíacas , Hidrocortisona , Fígado , Receptores de Glucocorticoides/genética , Fatores de Risco , Transcortina/genética
6.
Drug Discov Today ; 28(10): 103737, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37591410

RESUMO

To discover new drugs is to seek and to prove causality. As an emerging approach leveraging human knowledge and creativity, data, and machine intelligence, causal inference holds the promise of reducing cognitive bias and improving decision-making in drug discovery. Although it has been applied across the value chain, the concepts and practice of causal inference remain obscure to many practitioners. This article offers a nontechnical introduction to causal inference, reviews its recent applications, and discusses opportunities and challenges of adopting the causal language in drug discovery and development.


Assuntos
Descoberta de Drogas , Conhecimento , Humanos , Viés , Causalidade
7.
Nat Cardiovasc Res ; 1(1): 85-100, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-36276926

RESUMO

Coronary atherosclerosis results from the delicate interplay of genetic and exogenous risk factors, principally taking place in metabolic organs and the arterial wall. Here we show that 224 gene-regulatory coexpression networks (GRNs) identified by integrating genetic and clinical data from patients with (n = 600) and without (n = 250) coronary artery disease (CAD) with RNA-seq data from seven disease-relevant tissues in the Stockholm-Tartu Atherosclerosis Reverse Network Engineering Task (STARNET) study largely capture this delicate interplay, explaining >54% of CAD heritability. Within 89 cross-tissue GRNs associated with clinical severity of CAD, 374 endocrine factors facilitated inter-organ interactions, primarily along an axis from adipose tissue to the liver (n = 152). This axis was independently replicated in genetically diverse mouse strains and by injection of recombinant forms of adipose endocrine factors (EPDR1, FCN2, FSTL3 and LBP) that markedly altered blood lipid and glucose levels in mice. Altogether, the STARNET database and the associated GRN browser (http://starnet.mssm.edu) provide a multiorgan framework for exploration of the molecular interplay between cardiometabolic disorders and CAD.

8.
Front Endocrinol (Lausanne) ; 13: 949061, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36060942

RESUMO

Hormones act within in highly dynamic systems and much of the phenotypic response to variation in hormone levels is mediated by changes in gene expression. The increase in the number and power of large genetic association studies has led to the identification of hormone linked genetic variants. However, the biological mechanisms underpinning the majority of these loci are poorly understood. The advent of affordable, high throughput next generation sequencing and readily available transcriptomic databases has shown that many of these genetic variants also associate with variation in gene expression levels as expression Quantitative Trait Loci (eQTLs). In addition to further dissecting complex genetic variation, eQTLs have been applied as tools for causal inference. Many hormone networks are driven by transcription factors, and many of these genes can be linked to eQTLs. In this mini-review, we demonstrate how causal inference and gene networks can be used to describe the impact of hormone linked genetic variation upon the transcriptome within an endocrinology context.


Assuntos
Redes Reguladoras de Genes , Locos de Características Quantitativas , Hormônios , Polimorfismo de Nucleotídeo Único , Transcriptoma
9.
G3 (Bethesda) ; 12(2)2022 02 04.
Artigo em Inglês | MEDLINE | ID: mdl-34864982

RESUMO

Random effects models are popular statistical models for detecting and correcting spurious sample correlations due to hidden confounders in genome-wide gene expression data. In applications where some confounding factors are known, estimating simultaneously the contribution of known and latent variance components in random effects models is a challenge that has so far relied on numerical gradient-based optimizers to maximize the likelihood function. This is unsatisfactory because the resulting solution is poorly characterized and the efficiency of the method may be suboptimal. Here, we prove analytically that maximum-likelihood latent variables can always be chosen orthogonal to the known confounding factors, in other words, that maximum-likelihood latent variables explain sample covariances not already explained by known factors. Based on this result, we propose a restricted maximum-likelihood (REML) method that estimates the latent variables by maximizing the likelihood on the restricted subspace orthogonal to the known confounding factors and show that this reduces to probabilistic principal component analysis on that subspace. The method then estimates the variance-covariance parameters by maximizing the remaining terms in the likelihood function given the latent variables, using a newly derived analytic solution for this problem. Compared to gradient-based optimizers, our method attains greater or equal likelihood values, can be computed using standard matrix operations, results in latent factors that do not overlap with any known factors, and has a runtime reduced by several orders of magnitude. Hence, the REML method facilitates the application of random effects modeling strategies for learning latent variance components to much larger gene expression datasets than possible with current methods.


Assuntos
Genoma , Modelos Estatísticos , Expressão Gênica , Funções Verossimilhança
10.
BMC Bioinformatics ; 22(1): 525, 2021 Oct 27.
Artigo em Inglês | MEDLINE | ID: mdl-34706640

RESUMO

BACKGROUND: Molecular interaction networks summarize complex biological processes as graphs, whose structure is informative of biological function at multiple scales. Simultaneously, omics technologies measure the variation or activity of genes, proteins, or metabolites across individuals or experimental conditions. Integrating the complementary viewpoints of biological networks and omics data is an important task in bioinformatics, but existing methods treat networks as discrete structures, which are intrinsically difficult to integrate with continuous node features or activity measures. Graph neural networks map graph nodes into a low-dimensional vector space representation, and can be trained to preserve both the local graph structure and the similarity between node features. RESULTS: We studied the representation of transcriptional, protein-protein and genetic interaction networks in E. coli and mouse using graph neural networks. We found that such representations explain a large proportion of variation in gene expression data, and that using gene expression data as node features improves the reconstruction of the graph from the embedding. We further proposed a new end-to-end Graph Feature Auto-Encoder framework for the prediction of node features utilizing the structure of the gene networks, which is trained on the feature prediction task, and showed that it performs better at predicting unobserved node features than regular MultiLayer Perceptrons. When applied to the problem of imputing missing data in single-cell RNAseq data, the Graph Feature Auto-Encoder utilizing our new graph convolution layer called FeatGraphConv outperformed a state-of-the-art imputation method that does not use protein interaction information, showing the benefit of integrating biological networks and omics data with our proposed approach. CONCLUSION: Our proposed Graph Feature Auto-Encoder framework is a powerful approach for integrating and exploiting the close relation between molecular interaction networks and functional genomics data.


Assuntos
Escherichia coli , Redes Neurais de Computação , Animais , Biologia Computacional , Redes Reguladoras de Genes , Camundongos , Proteínas
11.
Sci Rep ; 11(1): 8294, 2021 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-33859262

RESUMO

Migraine attacks are delimited, allowing investigation of changes during and outside attack. Gene expression fluctuates according to environmental and endogenous events and therefore, we hypothesized that changes in RNA expression during and outside a spontaneous migraine attack exist which are specific to migraine. Twenty-seven migraine patients were assessed during a spontaneous migraine attack, including headache characteristics and treatment effect. Blood samples were taken during attack, two hours after treatment, on a headache-free day and after a cold pressor test. RNA-Sequencing, genotyping, and steroid profiling were performed. RNA-Sequences were analyzed at gene level (differential expression analysis) and at network level, and genomic and transcriptomic data were integrated. We found 29 differentially expressed genes between 'attack' and 'after treatment', after subtracting non-migraine specific genes, that were functioning in fatty acid oxidation, signaling pathways and immune-related pathways. Network analysis revealed mechanisms affected by changes in gene interactions, e.g. 'ion transmembrane transport'. Integration of genomic and transcriptomic data revealed pathways related to sumatriptan treatment, i.e. '5HT1 type receptor mediated signaling pathway'. In conclusion, we uniquely investigated intra-individual changes in gene expression during a migraine attack. We revealed both genes and pathways potentially involved in the pathophysiology of migraine and/or migraine treatment.


Assuntos
Transtornos de Enxaqueca/genética , Transcriptoma/genética , Adolescente , Adulto , Idoso , Epistasia Genética/efeitos dos fármacos , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Transtornos de Enxaqueca/tratamento farmacológico , RNA/genética , RNA/metabolismo , Sumatriptana/farmacologia , Sumatriptana/uso terapêutico , Adulto Jovem
12.
J Hum Genet ; 66(6): 625-636, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33469137

RESUMO

The stress hormone cortisol modulates fuel metabolism, cardiovascular homoeostasis, mood, inflammation and cognition. The CORtisol NETwork (CORNET) consortium previously identified a single locus associated with morning plasma cortisol. Identifying additional genetic variants that explain more of the variance in cortisol could provide new insights into cortisol biology and provide statistical power to test the causative role of cortisol in common diseases. The CORNET consortium extended its genome-wide association meta-analysis for morning plasma cortisol from 12,597 to 25,314 subjects and from ~2.2 M to ~7 M SNPs, in 17 population-based cohorts of European ancestries. We confirmed the genetic association with SERPINA6/SERPINA1. This locus contains genes encoding corticosteroid binding globulin (CBG) and α1-antitrypsin. Expression quantitative trait loci (eQTL) analyses undertaken in the STARNET cohort of 600 individuals showed that specific genetic variants within the SERPINA6/SERPINA1 locus influence expression of SERPINA6 rather than SERPINA1 in the liver. Moreover, trans-eQTL analysis demonstrated effects on adipose tissue gene expression, suggesting that variations in CBG levels have an effect on delivery of cortisol to peripheral tissues. Two-sample Mendelian randomisation analyses provided evidence that each genetically-determined standard deviation (SD) increase in morning plasma cortisol was associated with increased odds of chronic ischaemic heart disease (0.32, 95% CI 0.06-0.59) and myocardial infarction (0.21, 95% CI 0.00-0.43) in UK Biobank and similarly in CARDIoGRAMplusC4D. These findings reveal a causative pathway for CBG in determining cortisol action in peripheral tissues and thereby contributing to the aetiology of cardiovascular disease.


Assuntos
Doenças Cardiovasculares/genética , Infarto do Miocárdio/genética , Transcortina/genética , alfa 1-Antitripsina/genética , Corticosteroides/sangue , Adulto , Bancos de Espécimes Biológicos , Doenças Cardiovasculares/sangue , Doenças Cardiovasculares/epidemiologia , Doenças Cardiovasculares/patologia , Feminino , Regulação da Expressão Gênica , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Fígado/metabolismo , Fígado/patologia , Masculino , Análise da Randomização Mendeliana , Pessoa de Meia-Idade , Infarto do Miocárdio/sangue , Infarto do Miocárdio/patologia , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Reino Unido
13.
Mol Omics ; 17(2): 241-251, 2021 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-33438713

RESUMO

Causal gene networks model the flow of information within a cell. Reconstructing causal networks from omics data is challenging because correlation does not imply causation. When genomics and transcriptomics data from a segregating population are combined, genomic variants can be used to orient the direction of causality between gene expression traits. Instrumental variable methods use a local expression quantitative trait locus (eQTL) as a randomized instrument for a gene's expression level, and assign target genes based on distal eQTL associations. Mediation-based methods additionally require that distal eQTL associations are mediated by the source gene. A detailed comparison between these methods has not yet been conducted, due to the lack of a standardized implementation of different methods, the limited sample size of most multi-omics datasets, and the absence of ground-truth networks for most organisms. Here we used Findr, a software package providing uniform implementations of instrumental variable, mediation, and coexpression-based methods, a recent dataset of 1012 segregants from a cross between two budding yeast strains, and the Yeastract database of known transcriptional interactions to compare causal gene network inference methods. We found that causal inference methods result in a significant overlap with the ground-truth, whereas coexpression did not perform better than random. A subsampling analysis revealed that the performance of mediation saturates at large sample sizes, due to a loss of sensitivity when residual correlations become significant. Instrumental variable methods on the other hand contain false positive predictions, due to genomic linkage between eQTL instruments. Instrumental variable and mediation-based methods also have complementary roles for identifying causal genes underlying transcriptional hotspots. Instrumental variable methods correctly predicted STB5 targets for a hotspot centred on the transcription factor STB5, whereas mediation failed due to Stb5p auto-regulating its own expression. Mediation suggests a new candidate gene, DNM1, for a hotspot on Chr XII, whereas instrumental variable methods could not distinguish between multiple genes located within the hotspot. In conclusion, causal inference from genomics and transcriptomics data is a powerful approach for reconstructing causal gene networks, which could be further improved by the development of methods to control for residual correlations in mediation analyses, and for genomic linkage and pleiotropic effects from transcriptional hotspots in instrumental variable analyses.


Assuntos
Redes Reguladoras de Genes/genética , Locos de Características Quantitativas/genética , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Fatores de Transcrição/genética , Biologia Computacional , Bases de Dados Genéticas , Regulação Fúngica da Expressão Gênica/genética , Variação Genética , Genoma Fúngico/genética , Genômica , Modelos Genéticos
14.
Bioinformatics ; 36(6): 1807-1813, 2020 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-31688915

RESUMO

MOTIVATION: Recently, it has become feasible to generate large-scale, multi-tissue gene expression data, where expression profiles are obtained from multiple tissues or organs sampled from dozens to hundreds of individuals. When traditional clustering methods are applied to this type of data, important information is lost, because they either require all tissues to be analyzed independently, ignoring dependencies and similarities between tissues, or to merge tissues in a single, monolithic dataset, ignoring individual characteristics of tissues. RESULTS: We developed a Bayesian model-based multi-tissue clustering algorithm, revamp, which can incorporate prior information on physiological tissue similarity, and which results in a set of clusters, each consisting of a core set of genes conserved across tissues as well as differential sets of genes specific to one or more subsets of tissues. Using data from seven vascular and metabolic tissues from over 100 individuals in the STockholm Atherosclerosis Gene Expression (STAGE) study, we demonstrate that multi-tissue clusters inferred by revamp are more enriched for tissue-dependent protein-protein interactions compared to alternative approaches. We further demonstrate that revamp results in easily interpretable multi-tissue gene expression associations to key coronary artery disease processes and clinical phenotypes in the STAGE individuals. AVAILABILITY AND IMPLEMENTATION: Revamp is implemented in the Lemon-Tree software, available at https://github.com/eb00/lemon-tree. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software , Teorema de Bayes , Análise por Conglomerados , Perfilação da Expressão Gênica , Humanos
15.
Arterioscler Thromb Vasc Biol ; 39(11): 2386-2401, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31644355

RESUMO

OBJECTIVE: The male-specific region of the Y chromosome (MSY) remains one of the most unexplored regions of the genome. We sought to examine how the genetic variants of the MSY influence male susceptibility to coronary artery disease (CAD) and atherosclerosis. Approach and Results: Analysis of 129 133 men from UK Biobank revealed that only one of 7 common MSY haplogroups (haplogroup I1) was associated with CAD-carriers of haplogroup I1 had ≈11% increase in risk of CAD when compared with all other haplogroups combined (odds ratio, 1.11; 95% CI, 1.04-1.18; P=6.8×10-4). Targeted MSY sequencing uncovered 235 variants exclusive to this haplogroup. The haplogroup I1-specific variants showed 2.45- and 1.56-fold respective enrichment for promoter and enhancer chromatin states, in cells/tissues relevant to atherosclerosis, when compared with other MSY variants. Gene set enrichment analysis in CAD-relevant tissues showed that haplogroup I1 was associated with changes in pathways responsible for early and late stages of atherosclerosis development including defence against pathogens, immunity, oxidative phosphorylation, mitochondrial respiration, lipids, coagulation, and extracellular matrix remodeling. UTY was the only Y chromosome gene whose blood expression was associated with haplogroup I1. Experimental reduction of UTY expression in macrophages led to changes in expression of 59 pathways (28 of which overlapped with those associated with haplogroup I1) and a significant reduction in the immune costimulatory signal. CONCLUSIONS: Haplogroup I1 is enriched for regulatory chromatin variants in numerous cells of relevance to CAD and increases cardiovascular risk through proatherosclerotic reprogramming of the transcriptome, partly through UTY.


Assuntos
Cromossomos Humanos Y , Doença da Artéria Coronariana/genética , Pleiotropia Genética , Predisposição Genética para Doença , Expressão Gênica , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Macrófagos/metabolismo , Masculino , Antígenos de Histocompatibilidade Menor/genética , Proteínas Nucleares/genética , Filogenia , Fatores de Risco , Células THP-1
16.
R Soc Open Sci ; 6(7): 181806, 2019 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-31417693

RESUMO

Wisdom of the crowd, the collective intelligence from responses of multiple human or machine individuals to the same questions, can be more accurate than each individual and improve social decision-making and prediction accuracy. Crowd wisdom estimates each individual's error level and minimizes the overall error in the crowd consensus. However, with problem-specific models mostly concerning binary (yes/no) predictions, crowd wisdom remains overlooked in biomedical disciplines. Here we show, in real-world examples of transcription factor target prediction and skin cancer diagnosis, and with simulated data, that the crowd wisdom problem is analogous to one-dimensional unsupervised dimension reduction in machine learning. This provides a natural class of generalized, accurate and mature crowd wisdom solutions, such as PCA and Isomap, that can handle binary and also continuous responses, like confidence levels. They even outperform supervised-learning-based collective intelligence that is calibrated on historical performance of individuals, e.g. random forest. This study unifies crowd wisdom and unsupervised dimension reduction, and extends its applications to continuous data. As the scales of data acquisition and processing rapidly increase, especially in high-throughput sequencing and imaging, crowd wisdom can provide accurate predictions by combining multiple datasets and/or analytical methods.

17.
J Am Coll Cardiol ; 73(23): 2946-2957, 2019 06 18.
Artigo em Inglês | MEDLINE | ID: mdl-31196451

RESUMO

BACKGROUND: Genetic variants currently known to affect coronary artery disease (CAD) risk explain less than one-quarter of disease heritability. The heritability contribution of gene regulatory networks (GRNs) in CAD, which are modulated by both genetic and environmental factors, is unknown. OBJECTIVES: This study sought to determine the heritability contributions of single nucleotide polymorphisms affecting gene expression (eSNPs) in GRNs causally linked to CAD. METHODS: Seven vascular and metabolic tissues collected in 2 independent genetics-of-gene-expression studies of patients with CAD were used to identify eSNPs and to infer coexpression networks. To construct GRNs with causal relations to CAD, the prior information of eSNPs in the coexpression networks was used in a Bayesian algorithm. Narrow-sense CAD heritability conferred by the GRNs was calculated from individual-level genotype data from 9 European genome-wide association studies (GWAS) (13,612 cases, 13,758 control cases). RESULTS: The authors identified and replicated 28 independent GRNs active in CAD. The genetic variation in these networks contributed to 10.0% of CAD heritability beyond the 22% attributable to risk loci identified by GWAS. GRNs in the atherosclerotic arterial wall (n = 7) and subcutaneous or visceral abdominal fat (n = 9) were most strongly implicated, jointly explaining 8.2% of CAD heritability. In all, these 28 GRNs (each contributing to >0.2% of CAD heritability) comprised 24 to 841 genes, whereof 1 to 28 genes had strong regulatory effects (key disease drivers) and harbored many relevant functions previously associated with CAD. The gene activity in these 28 GRNs also displayed strong associations with genetic and phenotypic cardiometabolic disease variations both in humans and mice, indicative of their pivotal roles as mediators of gene-environmental interactions in CAD. CONCLUSIONS: GRNs capture a major portion of genetic variance and contribute to heritability beyond that of genetic loci currently known to affect CAD risk. These networks provide a framework to identify novel risk genes/pathways and study molecular interactions within and across disease-relevant tissues leading to CAD.


Assuntos
Doença da Artéria Coronariana/epidemiologia , Doença da Artéria Coronariana/genética , Redes Reguladoras de Genes/genética , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único/genética , Tecido Adiposo/patologia , Tecido Adiposo/fisiologia , Animais , Doença da Artéria Coronariana/diagnóstico , Endotélio Vascular/patologia , Endotélio Vascular/fisiologia , Feminino , Humanos , Masculino , Camundongos , Camundongos Transgênicos , Suécia/epidemiologia
18.
PLoS Negl Trop Dis ; 13(4): e0007262, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30943202

RESUMO

Antigenic variation is employed by many pathogens to evade the host immune response, and Trypanosoma brucei has evolved a complex system to achieve this phenotype, involving sequential use of variant surface glycoprotein (VSG) genes encoded from a large repertoire of ~2,000 genes. T. brucei express multiple, sometimes closely related, VSGs in a population at any one time, and the ability to resolve and analyse this diversity has been limited. We applied long read sequencing (PacBio) to VSG amplicons generated from blood extracted from batches of mice sacrificed at time points (days 3, 6, 10 and 12) post-infection with T. brucei TREU927. The data showed that long read sequencing is reliable for resolving variant differences between VSGs, and demonstrated that there is significant expressed diversity (449 VSGs detected across 20 mice) and across the timeframe of study there was a clear semi-reproducible pattern of expressed diversity (median of 27 VSGs per sample at day 3 post infection (p.i.), 82 VSGs at day 6 p.i., 187 VSGs at day 10 p.i. and 132 VSGs by day 12 p.i.). There was also consistent detection of one VSG dominating expression across replicates at days 3 and 6, and emergence of a second dominant VSG across replicates by day 12. The innovative application of ecological diversity analysis to VSG reads enabled characterisation of hierarchical VSG expression in the dataset, and resulted in a novel method for analysing such patterns of variation. Additionally, the long read approach allowed detection of mosaic VSG expression from very few reads-the earliest in infection that such events have been detected. Therefore, our results indicate that long read analysis is a reliable tool for resolving diverse gene expression profiles, and provides novel insights into the complexity and nature of VSG expression in trypanosomes, revealing significantly higher diversity than previously shown and the ability to identify mosaic gene formation early during the infection process.


Assuntos
Variação Antigênica , Trypanosoma brucei brucei/genética , Tripanossomíase Africana/imunologia , Glicoproteínas Variantes de Superfície de Trypanosoma/genética , Animais , Expressão Gênica , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Interações Hospedeiro-Parasita , Camundongos , Glicoproteínas Variantes de Superfície de Trypanosoma/imunologia
19.
Front Genet ; 10: 1196, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31921278

RESUMO

Studying the impact of genetic variation on gene regulatory networks is essential to understand the biological mechanisms by which genetic variation causes variation in phenotypes. Bayesian networks provide an elegant statistical approach for multi-trait genetic mapping and modelling causal trait relationships. However, inferring Bayesian gene networks from high-dimensional genetics and genomics data is challenging, because the number of possible networks scales super-exponentially with the number of nodes, and the computational cost of conventional Bayesian network inference methods quickly becomes prohibitive. We propose an alternative method to infer high-quality Bayesian gene networks that easily scales to thousands of genes. Our method first reconstructs a node ordering by conducting pairwise causal inference tests between genes, which then allows to infer a Bayesian network via a series of independent variable selection problems, one for each gene. We demonstrate using simulated and real systems genetics data that this results in a Bayesian network with equal, and sometimes better, likelihood than the conventional methods, while having a significantly higher overlap with groundtruth networks and being orders of magnitude faster. Moreover our method allows for a unified false discovery rate control across genes and individual edges, and thus a rigorous and easily interpretable way for tuning the sparsity level of the inferred network. Bayesian network inference using pairwise node ordering is a highly efficient approach for reconstructing gene regulatory networks when prior information for the inclusion of edges exists or can be inferred from the available data.

20.
Methods Mol Biol ; 1883: 95-109, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30547397

RESUMO

Reconstruction of causal gene networks can distinguish regulators from targets and reduce false positives by integrating genetic variations. Its recent developments in speed and accuracy have enabled whole-transcriptome causal network inference on a personal computer. Here, we demonstrate this technique with program Findr on 3000 genes from the Geuvadis dataset. Subsequent analysis reveals major hub genes in the reconstructed network.


Assuntos
Redes Reguladoras de Genes , Genômica/métodos , Modelos Genéticos , Transcriptoma/genética , Conjuntos de Dados como Assunto , Perfilação da Expressão Gênica/instrumentação , Perfilação da Expressão Gênica/métodos , Variação Genética , Genoma Humano/genética , Genômica/instrumentação , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Célula Única/métodos , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...